Neighbor-Sensitive Hashing
نویسندگان
چکیده
Approximate kNN (k-nearest neighbor) techniques using binary hash functions are among the most commonly used approaches for overcoming the prohibitive cost of performing exact kNN queries. However, the success of these techniques largely depends on their hash functions’ ability to distinguish kNN items; that is, the kNN items retrieved based on data items’ hashcodes, should include as many true kNN items as possible. A widely-adopted principle for this process is to ensure that similar items are assigned to the same hashcode so that the items with the hashcodes similar to a query’s hashcode are likely to be true neighbors. In this work, we abandon this heavily-utilized principle and pursue the opposite direction for generating more effective hash functions for kNN tasks. That is, we aim to increase the distance between similar items in the hashcode space, instead of reducing it. Our contribution begins by providing theoretical analysis on why this revolutionary and seemingly counter-intuitive approach leads to a more accurate identification of kNN items. Our analysis is followed by a proposal for a hashing algorithm that embeds this novel principle. Our empirical studies confirm that a hashing algorithm based on this counter-intuitive idea significantly improves the efficiency and accuracy of state-of-the-art techniques.
منابع مشابه
Fast Approximate Nearest-Neighbor Field by Cascaded Spherical Hashing
We present an e cient and fast algorithm for computing approximate nearest neighbor elds between two images. Our method builds on the concept of Coherency-Sensitive Hashing (CSH), but uses a recent hashing scheme, Spherical Hashing (SpH), which is known to be better adapted to the nearest-neighbor problem for natural images. Cascaded Spherical Hashing concatenates di erent con gurations of SpH ...
متن کاملLocality-Sensitive Hashing for Data with Categorical and Numerical Attributes Using Dual Hashing
Locality-sensitive hashing techniques have been developed to efficiently handle nearest neighbor searches and similar pair identification problems for large volumes of high-dimensional data. This study proposes a locality-sensitive hashing method that can be applied to nearest neighbor search problems for data sets containing both numerical and categorical attributes. The proposed method makes ...
متن کاملTwo-Stage Hashing for Fast Document Retrieval
This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient ...
متن کاملDeveloping a Good Hash Function for LSH
In the previous lecture we saw how to design locality sensitive hashing (LSH) for hamming and l1 distance, as a solution to the (c,R)-Near Neighbor problem. Whenever we use LSH for the nearest neighbor search, using some distance measure, the main task is to come up with a good elementary hashing function. This elementary hash function (H) is then used to create a composite hash function (G) fo...
متن کاملlsh, Nearest neighbor search in high dimensions
Calculating distance pairs is O(n2) in memory and time and finding the nearest neighbor is O(n) in time. Tree indexing techniques like kd-tree [2] were developed to cope with large n, however their performance quickly breaks down for p > 3 [3]. Locality sensitive hashing (LSH) [3] is a technique for generating hash numbers from high dimensional data, such that nearby points have identical hashe...
متن کاملEfficient Search in Document Image Collections
This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting word image matching may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic timewarping and/or nearest neighbor sea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 9 شماره
صفحات -
تاریخ انتشار 2015